Back

BMC Medical Research Methodology

41 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Does the type of publisher response to integrity concerns influence subsequent citations? A cohort study.
2026-02-27 health informatics 10.64898/2026.02.25.26346683
#1 (6.5%)
Show abstract

BackgroundJournals may respond to integrity concerns by publishing an editorial response (editorial notice, expression of concern (EoC) or retraction). We investigated whether the type of editorial response affected citation rates. MethodsWe obtained citations for 172 randomised controlled trials (RCTs) with integrity concerns (41 had editorial notices, 38 EoCs and 23 retractions) and control RCTs from the same journal and year. Monthly citation rates up to 60 months before and after editorial ...

2
Integrating stakeholder perspectives in modeling routine data for therapeutic decision-making
2026-02-18 epidemiology 10.64898/2026.02.18.26346074
Top 0.2% (5.7%)
Show abstract

BackgroundRoutinely collected health data are increasingly used to generate real-world evidence for therapeutic decision-making. Yet, stakeholders, including clinicians, pharmaceutical industry representatives, patient advocacy groups, and statisticians, prioritize different aspects of data quality, analysis, and interpretation. Without explicit consideration of these perspectives, analyses risk being fragmented, misaligned with end-user needs, or lacking transparency. MethodsWe developed a sta...

3
Standardisation of terminology, calculation and reporting for assigning exposure duration to drug utilisation records from healthcare data sources: the CreateDoT framework
2026-02-19 epidemiology 10.64898/2026.02.18.26346576
Top 0.2% (5.7%)
Show abstract

BackgroundIn pharmacoepidemiological studies, days of treatment (DoT) duration associated with individual electronic drug utilization records (DUR) are usually missing. Researcher-defined duration (RDD) calculation approaches, as opposed to data-driven approaches, can be used to estimate DoT based on the specific choices and assumptions made by investigators. These are usually underreported or even undocumented. We aimed to develop a framework for the standardization of terminology, formulas, im...

4
Collaborative large language models (LLMs) are all you need for screening in systematic reviews
2026-02-17 health informatics 10.64898/2026.02.07.26345640
Top 0.2% (5.6%)
Show abstract

BackgroundThe ability of large language models (LLMs) to work collaboratively and screen studies in a systematic review (SR) is under-explored. Hence, we aimed to evaluate the effectiveness of LLMs in automating the process of screening in systematic reviews. MethodsThis is an observational study which included labeled data (title and abstracts) for five SRs. Originally, two reviewers screened the citations independently for eligibility. A third reviewer cross-checked each citation for quality ...

5
Time-to-retraction and likelihood of evidence contamination (VITALITY Extension I): a retrospective cohort analysis
2026-02-24 epidemiology 10.64898/2026.02.20.26346631
Top 0.3% (5.1%)
Show abstract

BackgroundThe number of problematic randomized clinical trials (RCTs) has risen sharply in recent decades, posing serious challenges to the integrity of the healthcare evidence ecosystem. ObjectiveTo investigate whether retraction of problematic RCTs could reduce evidence contamination. DesignRetrospective cohort study SettingA secondary analysis of the VITALITY Study database. Participants1,330 retracted RCTs with 847 systematic reviews. MeasurementsThe difference in the median number (and...

6
Outcome Risk Modeling for Disability-Free Longevity: Comparison of Random Forest and Random Survival Forest Methods
2026-02-17 health informatics 10.64898/2026.02.13.26346264
Top 0.3% (4.9%)
Show abstract

BackgroundWhen creating risk prediction models for time-to-event data, methods that incorporate time are typically used. Random survival forests (RSF), an extension of random forests (RF), are one such class of models. We compared RSF to RF in the context of time-to-event outcomes in the ASPirin in Reducing Events in the Elderly (ASPREE) randomized controlled trial. We hypothesize that RSF will have superior discrimination and calibration versus RF. MethodsParticipants from ASPREE residing outs...

7
Early Detection of Absurdity Signals in Pharmacovigilance: A Machine Learning Ensemble Approach to Identify Rare Adverse Drug Reactions
2026-02-09 health informatics 10.64898/2026.02.06.26345783
Top 0.3% (4.9%)
Show abstract

BackgroundTraditional pharmacovigilance methods based on biostatistical approaches systematically exclude outliers and rare events, potentially missing critical safety signals. These methods fail to detect micro-clusters of adverse events and comorbidity patterns that may indicate serious but low-frequency adverse drug reactions (ADRs). We introduce the concept of absurdity signal detection - the identification of statistically anomalous but clinically significant adverse event patterns that co...

8
An E-value-Informed Sensitivity Analysis Framework for Hybrid Controlled Trials
2026-03-06 epidemiology 10.64898/2026.03.05.26347653
Top 0.4% (4.6%)
Show abstract

Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-...

9
Systematic reviews in minutes to hours using artificial intelligence
2026-02-10 health informatics 10.64898/2026.02.06.26345764
Top 0.4% (4.2%)
Show abstract

Systematic reviews are used in academia, biotechnology, pharmaceutical companies and government to synthesise and appraise large numbers of publications. The current (largely manual) workflow takes an average of 9-18 months1, at a cost of $100,000+ per review2. We built a platform, ScholaraAI, that leverages artificial intelligence to cut this to < 0.1% of the time, without compromising quality. ScholaraAI facilitates end-to-end systematic reviews; search, screening, data extraction, and analysi...

10
Show Your Work: Verbatim Evidence Requirements and Automated Assessment for Large Language Models in Biomedical Text Processing
2026-03-04 health informatics 10.64898/2026.03.03.26346690
Top 0.4% (4.1%)
Show abstract

PurposeLarge language models (LLMs) are used for biomedical text processing, but individual decisions are often hard to audit. We evaluated whether enforcing a mechanically checkable "show your work" quote affects accuracy, stability, and verifiability for trial eligibility-scope classification from abstracts. MethodsWe used 200 oncology randomized controlled trials (2005 - 2023) and provided models with only the title and abstract. Trials were labeled with whether they allowed for the inclusio...

11
Fully Automated Systematic Review Generation via Large Language Models: Quality Assessment and Implications for Scientific Publishing
2026-02-23 health informatics 10.64898/2026.02.18.26346559
Top 0.5% (4.0%)
Show abstract

Large language models (LLMs) are increasingly transforming scientific workflows, yet their application to rigorous evidence synthesis remains underexplored. Through the execution of a single Python script, we present a fully automated pipeline leveraging the Claude API to generate systematic reviews from literature search through manuscript completion without human intervention. Our pipeline processes hundreds of papers through iterative API calls for inclusion evaluation, information extraction...

12
Agentic Trial Emulation to Learn Health System-specific Drug Effects At Scale
2026-02-20 health informatics 10.64898/2026.02.19.26346539
Top 0.5% (4.0%)
Show abstract

ObjectiveElectronic Health Record (EHR)-based trial emulation can support translation of randomized clinical trial (RCT) evidence into practice, yet emulations often diverge from published RCT results. We hypothesized that these discrepancies are structured and learnable properties of a health systems data-generating process, and that autonomous agentic workflows can generate discrepancies at the scale required for cumulative learning. Materials and MethodsWe developed an agentic trial emulatio...

13
The Independence of Discrimination and Calibration in Clinical Risk Prediction: Lessons from a Multi-Timeframe Diabetes Prediction Framework
2026-02-14 health informatics 10.64898/2026.02.12.26346147
Top 0.6% (3.8%)
Show abstract

BackgroundClinical risk prediction models are typically evaluated by discrimination (area under the receiver operating characteristic curve, AUC), with calibration receiving less attention. We developed a multi-timeframe diabetes prediction framework emphasizing calibration and used synthetic data validation to investigate whether good discrimination guarantees good calibration. MethodsWe generated 500,000 synthetic patients using published epidemiological parameters from QDiabetes-2018, FINDRI...

14
Removing animal and nonhuman records in Ovid Embase: A comparison of 11 filters
2026-02-17 health informatics 10.64898/2026.02.13.26346239
Top 0.6% (3.7%)
Show abstract

IntroductionSeveral filters are routinely used to remove animal or nonhuman records in Ovid Embase, despite there being no performance data for them. The filters take different approaches in design. ObjectiveTo understand and compare the impact of 11 filters to remove animal or nonhuman records in Ovid Embase. To understand the indexing of relevant subject headings in Embase. MethodsTo assess filter performance, we screened and categorised 3,000 records as should be removed or should be reta...

15
Causal Effects of Natural Language Processing-Enhanced Clinical Decision Support on Early Cognitive Impairment Detection: A Propensity Score Analysis Using Inverse Probability of Treatment Weighting
2026-02-11 health informatics 10.64898/2026.02.10.26345968
Top 0.7% (3.6%)
Show abstract

BackgroundNatural language processing (NLP) systems integrated into clinical workflows show promise for detecting early cognitive impairment, yet causal evidence from real-world implementation remains limited. Observational studies comparing outcomes between hospitals with and without NLP-enhanced clinical decision support (CDS) systems face significant confounding from systematic differences in patient populations and institutional characteristics. ObjectiveTo estimate the causal effect of NLP...

16
The Causal Impact of Natural Language Processing-Driven Clinical Decision Support on Sepsis Mortality in England: An Augmented Synthetic Control Analysis of NHS Trust-Level Data
2026-03-02 health informatics 10.64898/2026.02.27.26347253
Top 0.8% (2.9%)
Show abstract

BackgroundSepsis remains a leading cause of preventable hospital mortality in England, with NHS England reporting over 48,000 sepsis-related deaths annually. Natural language processing (NLP)-driven clinical decision support systems (CDSS) have been deployed in several NHS Trusts to enable automated early detection of sepsis from unstructured clinical notes, yet causal evidence of their effectiveness at the hospital level remains limited. ObjectiveTo estimate the causal effect of implementing N...

17
Evaluating a Locally Deployed 20-Billion Parameter Large Language Model for Automated Abstract Screening in Systematic Reviews
2026-03-04 health informatics 10.64898/2026.03.04.26347506
Top 0.8% (2.8%)
Show abstract

BackgroundSystematic reviews (SRs) are essential for evidence-based medicine but require extensive time and resources for abstract screening. Large language models (LLMs) offer potential for automating this process, yet concerns about data privacy, intellectual property protection, and reproducibility limit the use of cloud-based solutions in research settings. ObjectiveTo evaluate the performance of a locally deployed 20-billion parameter LLM for automated abstract screening in systematic revi...

18
A Governance-Driven, Real-World Data-Calibrated Health Informatics Framework for Longitudinal Utilization Forecasting in Oncology and Complex Chronic Conditions
2026-02-26 health informatics 10.64898/2026.02.23.26346919
Top 0.8% (2.7%)
Show abstract

BackgroundHealthcare utilization forecasting systems are often derived from static, annualized market share assumptions that fail to represent real-world treatment dynamics. Such approaches systematically misestimate future utilization by ignoring longitudinal treatment sequencing, discontinuation with surveillance, recurrence-driven re-entry, and provider adoption dynamics. ObjectiveThis study proposes a reusable, governance-driven health informatics forecasting framework designed to generate ...

19
Blockchain-Enabled Health Information Exchange Efficiency Across South Korean Hospital Networks: A Stochastic Frontier Analysis with Bayesian Model Averaging
2026-02-26 health informatics 10.64898/2026.02.24.26347051
Top 0.8% (2.7%)
Show abstract

BackgroundSouth Koreas healthcare system, while technologically advanced, faces persistent inefficiencies in health information exchange (HIE) across its fragmented hospital network. Blockchain technology has been proposed as a decentralised infrastructure for secure, interoperable health data sharing, yet empirical evidence quantifying the efficiency gains attributable to blockchain-based HIE systems at the hospital network level remains absent. Traditional performance metrics fail to distingui...

20
Comparing AI and Human Coding of NIH Grant Abstracts to Identify Innovations in Opioid Addiction Treatment
2026-02-17 health informatics 10.64898/2026.02.13.26346235
Top 0.9% (2.7%)
Show abstract

Large language models (LLMs) are increasingly used for qualitative analysis in substance use research, yet their performance relative to human coders remains underexplored. This study compares ChatGPT-4.0 with human coders in identifying and describing the core innovation of NIH grants focused on reducing opioid overdose. A total of 118 NIH HEAL Initiative grant abstracts were independently coded by ChatGPT and humans to generate innovation descriptions, which were then evaluated by both human r...